Processing multiview video

ABSTRACT

Decoding a video signal comprises receiving a bitstream comprising the multiview video signal encoded according to dependency relationships between respective views, and view-dependency data representing the dependency relationships; extracting the view-dependency data and determining the dependency relationships from the extracted data; and decoding the multiview video signal according to the determined dependency relationships using illumination compensation between segments of pictures in respective views, where the multiview video signal includes multiple views each comprising multiple pictures segmented into multiple segments.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Application Ser. No.60/758,234 filed on Jan. 12, 2006, U.S. Application Ser. No. 60/759,620filed on Jan. 18, 2006, U.S. Application Ser. No. 60/762,534 filed onJan. 27, 2006, U.S. Application Ser. No. 60/787,193 filed on Mar. 30,2006, U.S. Application Ser. No. 60/818,274 filed on Jul. 5, 2006, U.S.Application Ser. No. 60/830,087 filed on Jul. 12, 2006, U.S. ApplicationSer. No. 60/830,328 filed on Jul. 13, 2006, Korean Application No.10-2006-0004956 filed on Jan. 17, 2006, Korean Application No.10-2006-0027100 filed on Mar. 24, 2006, Korean Application No.10-2006-0037773 filed on Apr. 26, 2006, Korean Application No.10-2006-0110337 filed on Nov. 9, 2006, and Korean Application No.10-2006-0110338 filed on Nov. 9, 2006, each of which is incorporatedherein by reference.

This application is related to U.S. application Ser. No. 11/622,591titled “PROCESSING MULTIVIEW VIDEO”, U.S. application Ser. No.11/622,611 titled “PROCESSING MULTIVIEW VIDEO”, U.S. application Ser.No. 11/622,618 titled “PROCESSING MULTIVIEW VIDEO”, U.S. applicationSer. No. 11/622,709 titled “PROCESSING MULTIVIEW VIDEO”, U.S.application Ser. No. 11/622,675 titled “PROCESSING MULTIVIEW VIDEO”,U.S. application Ser. No. 11/622,803 titled “PROCESSING MULTIVIEWVIDEO”, and U.S. application Ser. No. 11/622,681 titled “PROCESSINGMULTIVIEW VIDEO”, each of which is being filed concurrently with thepresent application, and each of which is also incorporated herein byreference.

BACKGROUND

The invention relates to processing multiview video.

Multiview Video Coding (MVC) relates to compression of video sequences(e.g., a sequence of images or “pictures”) that are typically acquiredby respective cameras. The video sequences or “views” can be encodedaccording to a standard such as MPEG. A picture in a video sequence canrepresent a full video frame or a field of a video frame. A slice is anindependently coded portion of a picture that includes some or all ofthe macroblocks in the picture, and a macroblock includes blocks ofpicture elements (or “pixels”).

The video sequences can be encoded as a multiview video sequenceaccording to the H.264/AVC codec technology, and many developers areconducting research into amendment of standards to accomodate multiviewvideo sequences.

Three profiles for supporting specific functions are prescribed in thecurrent H.264 standard. The term “profile” indicates the standardizationof technical components for use in the video encoding/decodingalgorithms. In other words, the profile is the set of technicalcomponents prescribed for decoding a bitstream of a compressed sequence,and may be considered to be a sub-standard. The above-mentioned threeprofiles are a baseline profile, a main profile, and an extendedprofile. A variety of functions for the encoder and the decoder havebeen defined in the H.264 standard, such that the encoder and thedecoder can be compatible with the baseline profile, the main profile,and the extended profile respectively.

The bitstream for the H.264/AVC standard is structured according to aVideo Coding Layer (VCL) for processing the moving-image coding (i.e.,the sequence coding), and a Network Abstraction Layer (NAL) associatedwith a subsystem capable of transmitting/storing encoded information.The output data of the encoding process is VCL data, and is mapped intoNAL units before it is transmitted or stored. Each NAL unit includes aRaw Byte Sequence Payload (RBSP) corresponding to either compressedvideo data or header information.

The NAL unit includes a NAL header and a RBSP. The NAL header includesflag information (e.g., nal_ref_idc) and identification (ID) information(e.g., nal_unit_type). The flag information “nal_ref_idc” indicates thepresence or absence of a slice used as a reference picture of the NALunit. The ID information “nal_unit_type” indicates the type of the NALunit. The RBSP stores compressed original data. An RBSP trailing bit canbe added to the last part of the RBSP, such that the length of the RBSPcan be represented by a multiple of 8 bits.

There are a variety of the NAL units, for example, an InstantaneousDecoding Refresh (IDR) picture, a Sequence Parameter Set (SPS), aPicture Parameter Set (PPS), and Supplemental Enhancement Information(SEI), etc.

The standard has generally defined a target product using variousprofiles and levels, such that the target product can be implementedwith appropriate costs. The decoder satisfies a predetermined constraintat a corresponding profile and level.

The profile and the level are able to indicate a function or parameterof the decoder, such that they indicate which compressed images can behandled by the decoder. Specific information indicating which one ofmultiple profiles corresponds to the bitstream can be identified byprofile ID information. The profile ID information “profile_idc”provides a flag for identifying a profile associated with the bitstream.The H.264/AVC standard includes three profile identifiers (IDs). If theprofile ID information “profile_idc” is set to “66”, the bitstream isbased on the baseline profile. If the profile ID information“profile_idc” is set to “77”, the bitstream is based on the mainprofile. If the profile ID information “profile_idc” is set to “88”, thebitstream is based on the extended profile. The above-mentioned“profile_idc” information may be contained in the SPS (SequenceParameter Set), for example.

SUMMARY

In one aspect, in general, a method for decoding a video signalcomprises: receiving a bitstream comprising the video signal encodedaccording to a first profile that represents a selection from a set ofmultiple profiles that includes at least one profile for a multiviewvideo signal, and profile information that identifies the first profile;extracting the profile information from the bitstream; and decoding thevideo signal according to the determined profile using illuminationcompensation between segments of pictures in respective views when thedetermined profile corresponds to a multiview video signal with each ofmultiple views comprising multiple pictures segmented into multiplesegments (e.g., an image block segment such as a single block or amacroblock, or a segment such as a slice of an image).

Aspects can include one or more of the following features.

The method further comprises extracting from the bitstream configurationinformation associated with multiple views when the determined profilecorresponds to a multiview video signal, wherein the configurationinformation comprises at least one of view-dependency informationrepresenting dependency relationships between respective views, viewidentification information indicating a reference view, view-numberinformation indicating the number of views, view level information forproviding view scalability, and view-arrangement information indicatinga camera arrangement.

The profile information is located in a header of the bitstream.

The view level information corresponds to one of a plurality of levelsassociated with a hierachical view prediction structure among the viewsof the multiview video signal.

The view-dependency information represents the dependency relationshipsin a two-dimensional data structure.

The two-dimensional data structure comprises a matrix.

The segments comprise image blocks.

Using illumination compensation for a first segment comprises obtainingan offset value for illumination compensation of a neighboring block byforming a sum that includes a predictor for illumination compensation ofthe neighboring block and a residual value.

The method further comprises selecting at least one neighboring blockbased on whether one or more conditions are satisfied for a neighboringblock in an order in which one or more vertical or horizontal neighborsare followed by one or more diagonal neighbors.

Selecting at least one neighboring block comprises determining whetherone or more conditions are satisfied for a neighboring block in theorder of: a left neighboring block, followed by an upper neighboringblock, followed by a right-upper neighboring block, followed by aleft-upper neighboring block.

Determining whether one or more conditions are satisfied for aneighboring block comprises extracting a value associated with theneighboring block from the bitstream indicating whether illuminationcompensation of the neighboring block is to be performed.

Selecting at least one neighboring block comprises determining whetherto use an offset value for illumination compensation of a singleneighboring block or multiple offset values for illuminationcompensation of respective neighboring blocks.

In another aspect, in general, a method for decoding a multiview videosignal comprises: receiving a bitstream comprising the multiview videosignal encoded according to dependency relationships between respectiveviews, and view-dependency data representing the dependencyrelationships; extracting the view-dependency data and determining thedependency relationships from the extracted data; and decoding themultiview video signal according to the determined dependencyrelationships using illumination compensation between segments ofpictures in respective views, where the multiview video signal includesmultiple views each comprising multiple pictures segmented into multiplesegments.

Aspects can include one or more of the following features.

The view-dependency data represents the dependency relationships in atwo-dimensional data structure.

The view-dependency data comprises a matrix.

The method further comprises extracting from the bit-streamconfiguration information comprising at least one of view identificationinformation indicating a reference view, view-number informationindicating the number of views, view level information for providingview scalability, and view-arrangement information indicating a cameraarrangement.

The segments comprise image blocks.

Using illumination compensation for a first segment comprises obtainingan offset value for illumination compensation of a neighboring block byforming a sum that includes a predictor for illumination compensation ofthe neighboring block and a residual value.

The method further comprises selecting at least one neighboring blockbased on whether one or more conditions are satisfied for a neighboringblock in an order in which one or more vertical or horizontal neighborsare followed by one or more diagonal neighbors.

Selecting at least one neighboring block comprises determining whetherone or more conditions are satisfied for a neighboring block in theorder of: a left neighboring block, followed by an upper neighboringblock, followed by a right-upper neighboring block, followed by aleft-upper neighboring block.

Determining whether one or more conditions are satisfied for aneighboring block comprises extracting a value associated with theneighboring block from the bitstream indicating whether illuminationcompensation of the neighboring block is to be performed.

Selecting at least one neighboring block comprises determining whetherto use an offset value for illumination compensation of a singleneighboring block or multiple offset values for illuminationcompensation of respective neighboring blocks.

The method further comprises, when multiple offset values are to beused, obtaining the predictor for performing illumination compensationof the first block by combining the multiple offset values.

Combining the multiple offset values comprises taking an average ormedian of the offset values.

In another aspect, in general, for each respective decoding method, amethod for encoding a video signal comprises generating a bitstreamcapable of being decoded into the video signal by the respectivedecoding method. For example, in another aspect, in general, a methodfor encoding a bitstream comprises: forming the bitstream according to afirst profile that represents a selection from a set of multipleprofiles that includes at least one profile for a multiview videosignal, and profile information that identifies the first profile; andproviding information for illumination compensation between segments ofpictures in respective views when the determined profile corresponds toa multiview video signal with each of multiple views comprising multiplepictures segmented into multiple segments. In another aspect, ingeneral, a method for encoding a bitstream comprises: forming thebitstream according to dependency relationships between respectiveviews, and view-dependency data representing the dependencyrelationships; and providing information for illumination compensationbetween segments of pictures in respective views when the determinedprofile corresponds to a multiview video signal with each of multipleviews comprising multiple pictures segmented into multiple segments.

In another aspect, in general, for each respective decoding method, acomputer program, stored on a computer-readable medium, comprisesinstructions for causing a computer to perform the respective decodingmethod.

In another aspect, in general, for each respective decoding method,image data embodied on a machine-readable information carrier is capableof being decoded into a video signal by the respective decoding method.

In another aspect, in general, for each respective decoding method, adecoder comprises means for performing the respective decoding method.

In another aspect, in general, for each respective decoding method, anencoder comprises means for generating a bitstream capable of beingdecoded into a video signal by the respective decoding method.

Other features and advantages will become apparent from the followingdescription, and from the claims.

DESCRIPTION OF DRAWINGS

FIG. 1 is an exemplary decoding apparatus.

DESCRIPTION

In order to effectively handle a multiview sequence, an input bitstreamincludes information that allows a decoding apparatus to determinewhether the input bitstream relates to a multiview profile. In casesthat it is determined that the input bitstream relates to the multiviewprofile, supplementary information associated with the multiviewsequence is added according to a syntax to the bitstream and transmittedto the decoder. For example, the multiview profile ID can indicate aprofile mode for handling multiview video data as according to anamendment of the H.264/AVC standard.

The MVC (Multiview Video Coding) technology is an amendment technologyof the H.264/AVC standards. That is, a specific syntax is added assupplementary information for an MVC mode. Such amendment to support MVCtechnology can be more effective than an alternative in which anunconditional syntax is used. For example, if the profile identifier ofthe AVC technology is indicative of a multiview profile, the addition ofmultiview sequence information may increase a coding efficiency.

The sequence parameter set (SPS) of an H.264/AVC bitstream is indicativeof header information including information (e.g., a profile, and alevel) associated with the entire-sequence encoding.

The entire compressed moving images (i.e., a sequence) can begin at asequence header, such that a sequence parameter set (SPS) correspondingto the header information arrives at the decoder earlier than datareferred to by the parameter set. As a result, the sequence parameterset RBSP acts as header information of a compressed data of movingimages at entry S1 (FIG. 1). If the bitstream is received, the profileID information “profile_idc” identifies which one of profiles from amongseveral profiles corresponds to the received bitstream.

The profile ID information “profile_idc” can be set, for example, to“MULTI_VIEW_PROFILE)”, so that the syntax including the profile IDinformation can determine whether the received bitstream relates to amultiview profile. The following configuration information can be addedwhen the received bitstream relates to the multiview profile.

FIG. 1 is a block diagram illustrating an exemplary decoding apparatus(or “decoder”) of a multiview video system for decoding a video signalcontaining a multiview video sequence. The multiview video systemincludes a corresponding encoding apparatus (or “encoder”) to providethe multiview video sequence as a bitstream that includes encoded imagedata embodied on a machine-readable information carrier (e.g., amachine-readable storage medium, or a machine-readable energy signalpropagating between a transmitter and receiver.)

Referring to FIG. 1, the decoding apparatus includes a parsing unit 10,an entropy decoding unit 11, an Inverse Quantization/Inverse Transformunit 12, an inter-prediction unit 13, an intra-prediction unit 14, adeblocking filter 15, and a decoded-picture buffer 16.

The inter-prediction unit 13 includes a motion compensation unit 17, anillumination compensation unit 18, and an illumination-compensationoffset prediction unit 19.

The parsing unit 10 performs a parsing of the received video sequence inNAL units to decode the received video sequence. Typically, one or moresequence parameter sets and picture parameter sets are transmitted to adecoder before a slice header and slice data are decoded. In this case,the NAL header or an extended area of the NAL header may include avariety of configuration information, for example, temporal levelinformation, view level information, anchor picture ID information, andview ID information, etc.

The term “time level information” is indicative ofhierarchical-structure information for providing temporal scalabilityfrom a video signal, such that sequences of a variety of time zones canbe provided to a user via the above-mentioned temporal levelinformation.

The term “view level information” is indicative ofhierarchical-structure information for providing view scalability fromthe video signal. The multiview video sequence can define the temporallevel and view level, such that a variety of temporal sequences and viewsequences can be provided to the user according to the defined temporallevel and view level.

In this way, if the level information is defined as described above, theuser may employ the temporal scalability and the view scalability.Therefore, the user can view a sequence corresponding to a desired timeand view, or can view a sequence corresponding to another limitation.The above-mentioned level information may also be established in variousways according to reference conditions. For example, the levelinformation may be changed according to a camera location, and may alsobe changed according to a camera arrangement type. In addition, thelevel information may also be arbitrarily established without a specialreference.

The term “anchor picture” is indicative of an encoded picture in whichall slices refer to only slices in a current view and not slices inother views. A random access between views can be used formultiview-sequence decoding.

Anchor picture ID information can be used to perform the random accessprocess to access data of a specific view without requiring a largeamount of data to be decoded.

The term “view ID information” is indicative of specific information fordiscriminating between a picture of a current view and a picture ofanother view. In order to discriminate one picture from other pictureswhen the video sequence signal is encoded, a Picture Order Count (POC)and frame number information (frame_num) can be used.

If a current sequence is determined to be a multiview video sequence,inter-view prediction can be performed. An identifier is used todiscriminate a picture of the current view from a picture of anotherview.

A view identifier can be defined to indicate a picture's view. Thedecoding apparatus can obtain information of a picture in a viewdifferent from a view of the current picture using the above-mentionedview identifier, such that it can decode the video signal using theinformation of the picture. The above-mentioned view identifier can beapplied to the overall encoding/decoding process of the video signal.Also, the above-mentioned view identifier can also be applied to themultiview video coding process using the frame number information“frame_num” considering a view.

Typically, the multiview sequence has a large amount of data, and ahierarchical encoding function of each view (also called a “viewscalability”) can be used for processing the large amount of data. Inorder to perform the view scalability function, a prediction structureconsidering views of the multiview sequence may be defined.

The above-mentioned prediction structure may be defined bystructuralizing the prediction order or direction of several viewsequences. For example, if several view sequences to be encoded aregiven, a center location of the overall arrangement is set to a baseview, such that view sequences to be encoded can be hierarchicallyselected. The end of the overall arrangement or other parts may be setto the base view.

If the number of camera views is denoted by an exponential power of “2”,a hierarchical prediction structure between several view sequences maybe formed on the basis of the above-mentioned case of the camera viewsdenoted by the exponential power of “2”. Otherwise, if the number ofcamera views is not denoted by the exponential power of “2”, virtualviews can be used, and the prediction structure may be formed on thebasis of the virtual views. If the camera arrangement is indicative of atwo-dimensional arrangement, the prediction order may be established byturns in a horizontal or vertical direction.

A parsed bitstream is entropy-decoded by an entropy decoding unit 11,and data such as a coefficient of each macroblock, a motion vector,etc., are extracted. The inverse quantization/inverse transform unit 12multiplies a received quantization value by a predetermined constant toacquire a transformed coefficient value, and performs an inversetransform of the acquired coefficient value, such that it reconstructs apixel value. The inter-prediction unit 13 performs an inter-predictionfunction from decoded samples of the current picture using thereconstructed pixel value.

At the same time, the deblocking filter 15 is applied to each decodedmacroblock to reduce the degree of block distortion. The deblockingfilter 15 performs a smoothing of the block edge, such that it improvesan image quality of the decoded frame. The selection of a filteringprocess is dependent on a boundary strength and a gradient of imagesamples arranged in the vicinity of the boundary. The filtered picturesare stored in the decoded picture buffer 16, such that they can beoutputted or be used as reference pictures.

The decoded picture buffer 16 stores or outputs pre-coded pictures toperform the inter-prediction function. In this case, frame numberinformation “frame_num” and POC (Picture Order Count) information of thepictures are used to store or output the pre-coded pictures. Pictures ofother view may exist in the above-mentioned pre-coded pictures in thecase of the MVC technology. Therefore, in order to use theabove-mentioned pictures as reference pictures, not only the “frame_num”and POC information, but also view identifier indicating a picture viewmay be used as necessary.

The inter-prediction unit 13 performs the inter-prediction using thereference pictures stored in the decoded picture buffer 16. Theinter-coded macroblock may be divided into macroblock partitions. Eachmacroblock partition can be predicted by one or two reference pictures.

The motion compensation unit 17 compensates for a motion of the currentblock using the information received from the entropy decoding unit 11.The motion compensation unit 17 extracts motion vectors of neighboringblocks of the current block from the video signal, and obtains amotion-vector predictor of the current block. The motion compensationunit 17 compensates for the motion of the current block using adifference value between the motion vector and a predictor extractedfrom the video signal and the obtained motion-vector predictor. Theabove-mentioned motion compensation may be performed by only onereference picture, or may also be performed by a plurality of referencepictures.

Therefore, if the above-mentioned reference pictures are determined tobe pictures of other views different from the current view, the motioncompensation may be performed according to a view identifier indicatingthe other views.

A direct mode is indicative of a coding mode for predicting motioninformation of the current block on the basis of the motion informationof a block which is completely decoded. The above-mentioned direct modecan reduce the number of bits required for encoding the motioninformation, resulting in the increased compression efficiency.

For example, a temporal direct mode predicts motion information of thecurrent block using a correlation of motion information of a temporaldirection. Similar to the temporal direct mode, the decoder can predictthe motion information of the current block using a correlation ofmotion information of a view direction.

If the received bitstream corresponds to a multiview sequence, viewsequences may be captured by different cameras respectively, such that adifference in illumination may occur due to internal or external factorsof the cameras. In order to reduce potential inefficiency associatedwith the difference in illumination, an illumination compensation unit18 performs an illumination compensation function.

In the case of performing illumination compensation function, flaginformation may be used to indicate whether an illumination compensationat a specific level of a video signal is performed. For example, theillumination compensation unit 18 may perform the illuminationcompensation function using flag information indicating whether theillumination compensation of a corresponding slice or macroblock isperformed. Also, the above-mentioned method for performing theillumination compensation using the above-mentioned flag information maybe applied to a variety of macroblock types (e.g., an inter 16×16 mode,a B-skip mode, a direct mode, etc.)

In order to reconstruct the current block when performing theillumination compensation, information of a neighboring block orinformation of a block in views different from a view of the currentblock may be used, and an offset value of the current block may also beused.

In this case, the offset value of the current block is indicative of adifference value between an average pixel value of the current block andan average pixel value of a reference block corresponding to the currentblock. As an example for using the above-mentioned offset value, apredictor of the current-block offset value may be obtained by using theneighboring blocks of the current block, and a residual value betweenthe offset value and the predictor may be used. Therefore, the decodercan reconstruct the offset value of the current block using the residualvalue and the predictor.

In order to obtain the predictor of the current block, information ofthe neighboring blocks may be used as necessary.

For example, the offset value of the current block can be predicted byusing the offset value of a neighboring block. Prior to predicting thecurrent-block offset value, it is determined whether the reference indexof the current block is equal to a reference index of the neighboringblocks. According to the determined result, the illuminationcompensation unit 18 can determine which one of neighboring blocks willbe used or which value will be used.

The illumination compensation unit 18 may perform the illuminationcompensation using a prediction type of the current block. If thecurrent block is predictively encoded by two reference blocks, theillumination compensation unit 18 may obtain an offset valuecorresponding to each reference block using the offset value of thecurrent block.

As described above, the inter-predicted pictures or intra-predictedpictures acquired by the illumination compensation and motioncompensation are selected according to a prediction mode, andreconstructs the current picture.

Examples of various aspects and features of the system are described inmore detail in concurrently filed applications: U.S. application Ser.No. 11/622,611 titled “PROCESSING MULTIVIEW VIDEO”, and U.S. applicationSer. No. 11/622,709 titled “PROCESSING MULTIVIEW VIDEO”, each of whichis incorporated herein by reference.

What is claimed is:
 1. A method for decoding multi-view video data in amulti-view video data stream, with a decoder, the method comprising:obtaining, with a Network Abstraction Layer parsing unit, identificationinformation representing the multi-view video data stream includinginter-view prediction structure information of a random access picture,all slices in the random access picture referring only to slices havinga same temporal position and being in a different view of the multi-viewvideo data; obtaining, with the Network Abstraction Layer parsing unit,inter-view prediction structure information of the random access picturefrom the multi-view video data stream based on the identificationinformation, the inter-view prediction structure information indicatinga reference relation between inter-view reference pictures; determining,with a decoded picture buffer unit, a reference picture list of acurrent slice for inter-view prediction using the inter-view predictionstructure information of the random access picture; determining, with aninter-prediction unit, a prediction value of a macroblock in the currentslice based on the determined reference picture list for inter-viewprediction; and decoding the macroblock in the current slice using theprediction value, wherein the multi-view video data includes video dataof a base view and an ancillary view, the base view indicating a viewbeing decoded independently of other views without using inter-viewprediction, and the ancillary view being a view other than the baseview, wherein the inter-view reference pictures are identified bydecoding order information between pictures, output order informationbetween pictures, and view information identifying a view of eachpicture, wherein the decoder includes the Network Abstraction Layerparsing unit, the decoded picture buffer unit, and the inter-predictionunit.
 2. The method of claim 1, wherein the inter-view predictionstructure information includes number information and viewidentification information, the number information indicating a totalnumber of views in the multi-view video data, and the viewidentification information providing a view identifier of each referenceview in the multi-view video data.
 3. The method of claim 1, wherein theinter-view prediction structure information of the random access pictureis obtained by considering a predictive direction.
 4. The method ofclaim 3, wherein the predictive direction represents a forward directionor a backward direction in an output order of pictures.
 5. The method ofclaim 1, wherein the ancillary view is decoded by referring to the baseview.
 6. The method of claim 1, wherein the inter-view predictionstructure information is obtained from sequence parameter setinformation of a multi-view video.
 7. An apparatus for decodingmulti-view video data in a multi-view video data stream, comprising: aNetwork Abstraction Layer parsing unit obtaining identificationinformation representing the multi-view video data stream includinginter-view prediction structure information of a random access picture,all slices in the random access picture referring only to slices havinga same temporal position and being in a different view of the multi-viewvideo data, and obtaining inter-view prediction structure information ofthe random access picture from the multi-view video data stream based onthe identification information, the inter-view prediction structureinformation indicating a reference relation between inter-view referencepictures; a decoded picture buffer unit determining a reference picturelist of a current slice for inter-view prediction using the inter-viewprediction structure information of the random access picture; and aninter-prediction unit determining a prediction value of a macroblock inthe current slice based on the determined reference picture list forinter-view prediction, and decoding the macroblock in the current sliceusing the prediction value, wherein the multi-view video data includesvideo data of a base view and an ancillary view, the base viewindicating a view being decoded independently of other views withoutusing inter-view prediction, and the ancillary view being a view otherthan the base view, wherein the inter-view reference pictures areidentified by decoding order information between pictures, output orderinformation between pictures and view information identifying a view ofeach picture.
 8. The apparatus of claim 7, wherein the inter-viewprediction structure information includes number information and viewidentification information, the number information indicating a totalnumber of views in the multi-view video data, and the viewidentification information providing a view identifier of each referenceview in the multi-view video data.
 9. The apparatus of claim 7, whereinthe inter-view prediction structure information of the random accesspicture is obtained by considering a predictive direction.
 10. Theapparatus of claim 9, wherein the predictive direction represents aforward direction or a backward direction in an output order ofpictures.
 11. The apparatus of claim 7, wherein the ancillary view isdecoded by referring to the base view.
 12. The apparatus of claim 7,wherein the inter-view prediction structure information is obtained fromsequence parameter set information of a multi-view video.
 13. A methodfor decoding multi-view video data in a multi-view video data stream,with a decoder, the method comprising: obtaining, with a NetworkAbstraction Layer parsing unit, identification information representingthe multi-view video data stream including inter-view predictionstructure information of a random access picture, all slices in therandom access picture referring only to slices having a same temporalposition and being in a different view of the multi-view video data;obtaining, with the Network Abstraction Layer parsing unit, inter-viewprediction structure information of the random access picture from themulti-view video data stream based on the identification information,the inter-view prediction structure information indicating a referencerelation between inter-view pictures; determining, with a decodedpicture buffer unit, a reference picture list of a current slice forinter-view prediction using the inter-view prediction structureinformation of the random access picture; determining, with aninter-prediction unit, a prediction value of a macroblock in the currentslice based on the determined reference picture list for inter-viewprediction; and decoding the macroblock in the current slice using theprediction value, wherein the multi-view video data includes video dataof a base view and an ancillary view, the base view indicating a viewbeing decoded independently of other views without using inter-viewprediction, and the ancillary view being a view other than the baseview, wherein the inter-view pictures are identified by view informationidentifying a view of each picture, wherein the decoder includes theNetwork Abstraction Layer parsing unit, the decoded picture buffer unit,and the inter-prediction unit.
 14. The method of claim 13, wherein theinter-view prediction structure information includes number informationand view identification information, the number information indicating atotal number of views in the multi-view video data, and the viewidentification information providing a view identifier of each referenceview in the multi-view video data.
 15. The method of claim 13, whereinthe inter-view prediction structure information of the random accesspicture is obtained by considering a predictive direction.
 16. Themethod of claim 15, wherein the predictive direction represents aforward direction or a backward direction in an output order ofpictures.
 17. The method of claim 13, wherein the ancillary view isdecoded by referring to the base view.
 18. The method of claim 13,wherein the inter-view prediction structure information is obtained fromsequence parameter set information of a multi-view video.
 19. Anapparatus for decoding multi-view video data in a multi-view video datastream, comprising: a Network Abstraction Layer parsing unit obtainingidentification information representing the multi-view video data streamincluding inter-view prediction structure information of a random accesspicture, all slices in the random access picture referring only toslices having a same temporal position and being in a different view ofthe multi-view video data, and obtaining inter-view prediction structureinformation of the random access picture from the multi-view video datastream based on the identification information, the inter-viewprediction structure information indicating a reference relation betweeninter-view pictures; a decoded picture buffer unit determining areference picture list of a current slice for inter-view predictionusing the inter-view prediction structure information of the randomaccess picture; and an inter-prediction unit determining a predictionvalue of a macroblock in the current slice based on the determinedreference picture list for inter-view prediction, and decoding themacroblock in the current slice using the prediction value, wherein themulti-view video data includes video data of a base view and anancillary view, the base view indicating a view being decodedindependently of other views without using inter-view prediction, andthe ancillary view being a view other than the base view, wherein theinter-view pictures are identified by view information identifying aview of each picture.
 20. The apparatus of claim 19, wherein theinter-view prediction structure information includes number informationand view identification information, the number information indicating atotal number of views in the multi-view video data, and the viewidentification information providing a view identifier of each referenceview in the multi-view video data.
 21. The apparatus of claim 19,wherein the inter-view prediction structure information of the randomaccess picture is obtained by considering a predictive direction. 22.The apparatus of claim 21, wherein the predictive direction represents aforward direction or a backward direction in an output order ofpictures.
 23. The apparatus of claim 19, wherein the ancillary view isdecoded by referring to the base view.
 24. The apparatus of claim 19,wherein the inter-view prediction structure information is obtained fromsequence parameter set information of a multi-view video.